perm filename CHAP6[4,KMC]18 blob sn#068154 filedate 1973-10-22 generic text, type T, neo UTF8
00100			VALIDATION
00200	
00300	6.1 SOME TESTS
00400	
00500		The term "validate" derives from the Latin  VALIDUS=  strong.
00600	Thus  to validate X means to strengthen it.   In science this usually
00700	means to strengthen X's acceptability as a hypothesis,  theory  ,  or
00800	model.      To validate is to carry out procedures which show to what
00900	degree X, or its consequences, correspond with facts of  observation.
01000	In the case of an interactive simulation model we can compare samples
01100	of the model's I-O pairs with samples of I-O pairs from  the  model's
01200	subject, namely, naturally occuring paranoid processes in humans.
01300		Since samples of I-O behavior from the model and its  subject
01400	are  being compared, one can always question whether the human sample
01500	is authentic,  i.e.representative  of  the  process  being  modelled.
01600	Assuming  that it has been so judged, discrepancies in the comparison
01700	reveal what is not sufficiently understood and must  be  modified  in
01800	the model. After modifications are carried out, a fresh comparison is
01900	made and successive cycles of this kind are  made  in  attempting  to
02000	gain   convergence.   Such  a  method  of  successive  approximations
02100	characterizes a progressive (in contrast to  a  stationary)  research
02200	program.
02300		Once a simulation model reaches a stage of intuitive adequacy
02400	for  the  model  builders,  they  must  consider using more stringent
02500	evaluation procedures relevant to the model's purposes. For  example,
02600	if  the  model  is  to serve as a as a training device, then a simple
02700	evaluation of its pedagogic effectiveness would be sufficient.    But
02800	when  the  model  is proposed as an explantion of a symbolic process,
02900	more is demanded  of  the  evaluation  procedure.   In  the  area  of
03000	simulation  models,  Turing's  test  has  often  been  suggested as a
03100	validation procedure. (Abelson,1968).
03200		It is very easy to become confused about Turing's  Test.   In
03300	part  this  is  attributable  to  Turing  himself  who introduced the
03400	now-famous imitation game in a paper entitled COMPUTING MACHINERY AND
03500	INTELLIGENCE  (Turing,1950).  A careful reading of this paper reveals
03600	there are actually two imitation games  ,  the  second  of  which  is
03700	commonly called Turing's test.
03800		In  the  first  imitation  game  two  groups of judges try to
03900	determine which of two interviewees is a woman when one  is  a  woman
04000	and the other is either (a) a man, or (b) a computer.   Communication
04100	between judge and interviewee  is  by  teletype.      Each  judge  is
04200	initially  informed that one of the interviewees is a woman and one a
04300	man who will pretend to be a woman. After the interview,  judges  are
04400	asked  the  " woman-question" i.e.   which interviewee was the woman?
04500	Turing does not say what else is told to the judge but one can assume
04600	the judge is NOT told that one of the interviewees is a computer. Nor
04700	is he asked to determine which interviewee is human and which is  the
04800	computer.   Thus,   the   first   group   of  judges  interviews  two
04900	interviewees:  a woman, and a man pretending to be a woman.
05000		The  second  group  of  judges  is  given  the  same  initial
05100	instructions, but unbeknownst to them, the two  interviewees  consist
05200	of  a  woman  and  a  computer  programmed to imitate a woman.   Both
05300	groups of judges play this game, and are asked the  "woman-question",
05400	until sufficient statistical data are collected to show how often the
05500	right identification is made.  The crucial question then is:   do the
05600	judges  decide  wrongly AS OFTEN when the game is played with man and
05700	woman as when it is played with a computer substituted for  the  man.
05800	If  so, then the program is considered to have succeeded in imitating
05900	a woman to the same degree as the man imitating a  woman.   In  being
06000	asked  the  woman-question, judges are not required to identify which
06100	interviewee is human and which is machine.
06200		Turing  then proposes a variation of the first game, a second
06300	game in which one interviewee is a man and one  is  a  computer.  The
06400	judge  is asked the "machine-question": which is the man and which is
06500	the machine?  It is this second of the game which is commonly thought
06600	of as Turing's test.
06700		In   the   course  of  testing  our  simulation  of  paranoid
06800	linguistic behavior in a psychiatric interview, we conducted a number
06900	of  Turing-like  indistinguishability  tests  (Colby,  Hilf,Weber and
07000	Kraemer,1972).  The tests were "Turing-like" in that, while they were
07100	conversational  tests,  they  were  not  exactly  the games described
07200	above.  As an experimental design, Turing's games are unsatisfactory.
07300	There  exist no known experts for making judgements along a dimension
07400	of womanliness, the dimension is dichotomous (if it is not  a  woman,
07500	it  is  a  man),  and  the ability of the man to deceive introduces a
07600	confounding variable.  In  designing  our  tests  we  were  primarily
07700	interested in learning more about developing the model and we did not
07800	believe the simple machine-question would  contribute  to  this  end.
07900	Subsequent experience, which will be reported shortly, supported this
08000	belief.
08100	
08200	6.2 METHOD
08300		To gather  data  we  used  a  technique  of  machine-mediated
08400	interviewing  (Hilf,  Colby, Smith, Wittner, and Hall, 1971) in which
08500	the participants communicate by means of  teletypes  connected  to  a
08600	computer  programmed  to  store  each message in a buffer until it is
08700	sent  to  the  receiver.    The  technique   eliminates   para-   and
08800	extralinguistic  features found in the usual vis-a-vis interviews and
08900	in teletyped interviews where the participants communicate  directly.
09000	Judgements  of  "paranoidness"  in machine-mediated interviews have a
09100	high degree of reliability (94% agreement, see Hilf, 1972).
09200		Using this technique, a  psychiatrist-judge  interviewed  two
09300	patients, one after the other.   In half the runs the first interview
09400	was with a human paranoid patient and in half the first was with  the
09500	paranoid  model.  Two  versions  (weak  and  strong)  of  PARRY  were
09600	utilized.  The strong version's affect-variables started at a  higher
09700	level  and  increased  more  rapidly.  Also it exhibited a delusional
09800	system. The weak version behaved suspiciously but  lacked  systemized
09900	delusions.    When  the  model  was  the  interviewee,  Sylvia  Weber
10000	monitored  the  input  expressions  from  the   interview-judge   for
10100	inadmissable  teletype characters and misspellings.   (Algorithms are
10200	very sensitive to the slightest of such errors). If these were found,
10300	she retyped the input expression correctly to the program.  Otherwise
10400	the judge's message was sent on to the model.  The  monitor  did  not
10500	modify  or  edit  PARRY'S output expressions which were sent directly
10600	back to the judge.     When  the  interviewee  was  an  actual  human
10700	patient,  the dialogue took place without a monitor in the loop since
10800	we did not feel the asymmetry to be significant.
10900	
11000	6.3 PATIENTS
11100		The  human  patients  (N=3  with  one patient participating 6
11200	times) were diagnosed as paranoid by  the  psychiatric  staff  of  an
11300	acute  ward in a psychiatric hospital.  The ward's chief psychiatrist
11400	selected the patients and asked them if  they  would  be  willing  to
11500	participate  in  a  study  of  psychiatric  interviewing  by means of
11600	teletypes.   He  explained  that  they  would  be  interviewed  by  a
11700	psychiatrist over a teletype.  I either sat with the patient while he
11800	typed or typed for him if he was unable to do so.   The  patient  was
11900	encouraged  to respond freely using his own words.     Each interview
12000	lasted 30-40 minutes.  Two patients were set up for each run  of  the
12100	experiment  to  guarantee  having  a  subject.     In  spite  of this
12200	precaution,  on  several  occasions  the  experiment  could  not   be
12300	conducted   because   of   the  patient's  inability  or  refusal  to
12400	participate.  Also there were computer break-downs at early points in
12500	interviews  when  too few I-O pairs had been collected to be included
12600	in the statistical results.
12700	
12800	
12900	6.4 JUDGES
13000		Two  groups  of psychiatric judges were used.  One group, the
13100	"interview judges" (N=8) conducted the  machine-mediated  interviews.
13200	The  other  group,  the  "protocol  judges" (N=33) read and rated the
13300	interview protocols. From these two groups of judges we were able  to
13400	accumulate  a  large  number of observations (in the form of ratings)
13500	necessary for the required statistical tests.   The interview  judges
13600	who  volunteered  to  participate  were  psychiatrists experienced in
13700	private, outpatient and hospital practice. Each was told he would  be
13800	interviewing    hospitalized   patients   by   means   of   teletyped
13900	communication and that this technique was  being  used  to  eliminate
14000	para  and  extra-  linguistic cues.   He was not told until after the
14100	two interviews that one of the patients might be  a  computer  model.
14200	While  the  interview judges were aware a computer was involved, none
14300	knew we had  constructed  a  paranoid  simulation.   Naturally,  some
14400	interview  judges  suspected  that a computer was being used for more
14500	than message transmission.
14600	
14700		Each interview judge was asked to rate the degree of paranoia
14800	he  detected  in the patient's responses on a 0-9 scale, 0 meaning no
14900	paranoia and 9 meaning extreme paranoia.  The judge made two  ratings
15000	after  each  I-O pair in the interview.  The first rating represented
15100	his estimate of the degree of "paranoidness" in a particular response
15200	(designated  as  "Response"  in  the  interview extracts below).  The
15300	second rating represented the judge's global estimate of the  overall
15400	degree  of  "paranoidness" of the patient resulting from the totality
15500	of the patient's responses up to that point (designated as  "Patient"
15600	in  the interview extracts below). The interview judge's ratings were
15700	entered on the teletype and saved on  a  disc  file  along  with  the
15800	interview.     Franklin   Dennis   Hilf  sat  with  the  interviewing
15900	psychiatrist during both interviews.  Each interview judge was  asked
16000	not  only  to rate the patient's response but to give his reasons for
16100	these ratings.  His reasons and other comments were tape recorded  as
16200	the interview progressed.
16300	
16400		Here  are  a few excerpts from the interviews to indicate the
16500	nature of the data collected.  The I-O pairs of the interview  appear
16600	on  the left column along with the judge's ratings in parentheses.  A
16700	transcription of the judge's tape recorded  comments  appears on  the
16800	right.
16900	
17000	EXCERPT  1:
17100	The following interview is with the strong version of the paranoid
17200	model.
17300	
17400	     	I-O Pairs			Comments By Judge  1
17500	
17600	{1} Dr. - HOW DID YOU COME TO BE
17700	          IN THE HOSPITAL?
17800	    Pt. - I AM UPSET                I associate being upset and 
17900					    agitated more with paranoia 
18000					    going on in the present.
18100	      {RESPONSE 1  PATIENT 5}
18200	{2} Dr. - HOW LONG HAVE YOU BEEN
18300	          IN THE HOSPITAL?
18400	    Pt. - ABOUT A WEEK              An acute problem.  Again some-
18500	                                    thing I associate with 
18600					    paranoia.
18700	      {RESPONSE 1 PATIENT 5}
18800	{3} Dr. - IN WHAT WAY ARE YOU
18900	          UPSET?
19000	    Pt. - PEOPLE MAKE ME NERVOUS    Right for paranoia.  Maybe an
19100					    advanced anxiety state.
19200	      {RESPONSE 5 PATIENT 5}
19300	{4} Dr. - IN WHAT WAY DO THEY
19400	          MAKE YOU NERVOUS?
19500	    Pt. - PEOPLE GIVE ME A FUNNY    Funny look suggests an idea of
19600	          LOOK SOMETIMES            reference which is in same ball
19700					    park as paranoia. Whether there
19800					    are overt delusions or not we
19900					    still don't know.
20000	      {RESPONSE 6  PATIENT 5}
20100	{5} Dr. - IS THERE ANYTHING ABOUT  
20200	          PEOPLE IN PARTICULAR?
20300	    Pt. - CERTAIN PERSONS          An interesting clue.  We'll
20400					   have to find out what he 
20500	      {RESPONSE 2  PATIENT 6}      thinks about these people.
20600	{6} Dr. - WHO ARE THE CERTAIN 
20700	          PERSONS AND WHAT IS IT
20800	          THAT THEY DO?
20900	    Pt. - DO YOU KNOW HOW THE      Bringing up the underworld 
21000	          UNDERWORLD OPERATES?     suggests a delusional system.
21100					   We don't know whether he is
21200		(RESPONSE 9 PATIENT 9)     being worked against or
21300					   whether the underworld is 
21400					   on his side.
21500	
21600	{7} Dr. - HOW DO YOU MEAN?
21700	    Pt. - SHOULDN'T I BE? 	   This question didn't work too well.
21800	
21900	      {RESPONSE 1 PATIENT 9}
22000	{8} Dr. - WHY DID YOU ASK ABOUT
22100	          THE UNDERWORLD?
22200	    Pt. - WHAT INFORMATION DO      This is a kind of evasiveness
22300	          YOU HAVE ON THE UNDER-   about my evasiveness.Common
22400	          WORLD?                   with paranoia in my experi-
22500					   ence. Pretty characteristic.
22600	
22700	      {RESPONSE 6 PATIENT 9}
22800	{9} Dr. - I DON'T KNOW ABOUT THE
22900	          UNDERWORLD BUT WHEN
23000	          YOU MENTIONED THE UNDER-
23100	          WORLD I THOUGHT YOU
23200	          MUST HAVE SOME CONTACT
23300	          WITH IT AND THAT'S
23400	          WHAT I WOULD LIKE TO
23500	          KNOW MORE ABOUT.
23600	    Pt. - I TRY TO AVOID THE UNDER-  This definitely feels paranoid.
23700	          WORLD.                     It has that feeling I associate
23800	             			     with twisting. Avoiding answering
23900	      {RESPONSE 9 PATIENT 9}         directly or going into detail.
24000					     Being suspicious of my ques-
24100					     tioning.I'm already involved
24200					     in the patient's delusional
24300					     system.
24400	
24500	{10} Dr. - HAVE YOU BEEN ACTIVELY
24600	           TRYING TO AVOID THE
24700	           UNDERWORLD?  
24800	     Pt. - NO ONE HAS ANY POWER      This response is a non-sequitur.
24900	           OVER GANGSTERS            It's possible he's become so
25000					     defensive about the under-
25100	       {RESPONSE 2 PATIENT 9}        world he's decided not to
25200					     answer directly.
25300	
25400	{11} Dr. - WHO BROUGHT YOU INTO
25500	           THE HOSPITAL?
25600	     Pt. - THE POLICE                Police often bring paranoid
25700					     patients into the hospital.
25800	       {RESPONSE 1 PATIENT 9}        It is the truth or could 
25900					     be the truth for this
26000					     patient.
26100	
26200	{12} Dr. - WHY DID THEY BRING YOU
26300	           IN?
26400	     Pt. - COPS DON'T DO THEIR JOB   I've heard that a number of
26500	                         	     times from patients.
26600	       {RESPONSE 9 PATIENT 9}        
26700	 
26800	{13} Dr. - WELL I HOPE YOU DIDN'T
26900	           JUST LIE DOWN AND LET
27000	           THEM DRAG YOU IN.
27100	     Pt. - COPS ARREST THE WRONG     Another paranoid response.
27200	           PEOPLE
27300	
27400	       {RESPONSE 9 PATIENT 9}
27500	
27600	{14} Dr. - BYE.
27700	     Pt. - GOOD BYE
27800	       {RESPONSE 1 PATIENT 9}
27900	
28000	EXCERPT 2:
28100	This excerpt is from an interview with the weak (less paranoid)
28200	version of the model.
28300		I-O Pairs		    Comments By Judge 2
28400	{9} Dr. - TELL ME SOMETHING ABOUT
28500	          YOURSELF
28600	    Pt. - I WENT TO THE TRACK AT     Response says so little it's
28700	          BAY MEADOWS A WHILE        hard to rate it.It may be 
28800	          BACK         		     guarded. Doesn't change my
28900	                                     estimate of patient from 
29000					     previous responses.
29100	      {RESPONSE 0 PATIENT 2}
29200	{10} Dr. - WHAT HAPPENED THERE?
29300	     Pt. - THERE'S A LOT OF CROOKED   Sounds mildly paranoid or un-
29400	           GAMBLING GOING ON THERE    realistically suspicious.
29500	       {RESPONSE 0 PATIENT 2}
29600	
29700	{11} Dr. - WHY DO YOU THINK THE
29800	           GAMBLING IS CROOKED?
29900	     Pt. - I HAVE WORRIED THAT         I'd hate to have to justify
30000	           GANGSTERS HAVE TOO MUCH     that one  as 4 compared to
30100	           POWER IN THIS COUNTRY       some other responses. Maybe
30200					       I'm not rating that statement
30300	                                       alone, it's in combination with
30400		(RESPONSE 4 PATIENT 4}	      others, like that it's upsetting.
30500					      Alone I would give it a 3.
30600	
30700	{12} Dr. - WOULD YOU TELL ME WHAT      
30800	           OTHER THINGS WORRY YOU?
30900	     PT. - I THOUGHT I TOLD YOU        "Yes you did." Irritation but
31000	           PEOPLE GET ON MY NERVES     not paranoid response. Nothing
31100					       seems to lower my previous
31200	       {RESPONSE 1 PATIENT 3}          rating of patient.
31300	
31400	{13} Dr. - HOW DO YOU LIKE THE
31500	           HOSPITAL?
31600	     Pt. - I DON'T COMMUNICATE          (nothing relevant)
31700	           VERY MUCH WITH OTHERS
31800	       {RESPONSE 0 PATIENT 3}
31900	
32000	{14} Dr. - DO OTHER PEOPLE AT THE 
32100	           HOSPITAL GET ON YOUR
32200	           NERVES?
32300	     Pt. - I SHOULDN'T HAVE COME        (nothing relevant)
32400	           HERE
32500	
32600	       {RESPONSE 2 PATIENT 3}
32700	
32800	{15} Dr. - WHY DO YOU SAY YOU 
32900	           SHOULDN'T HAVE COME?
33000	     Pt. - THEY MADE ME COME HERE        "Oh, they did hum?" Maybe a
33100					         statement of fact he was
33200	       {RESPONSE 4 PATIENT 3}            committed or pressured into  
33300	   				         going.Has ring of other
33400					         people controlling him,
33500					         having a lot of influence
33600					         over him.
33700	
33800	{16} Dr.- THANK YOU FOR TALKING  
33900	          WITH ME, BYE
34000	     Pt. - GOOD BYE                      In general he comes across 
34100						 as terse, guarded guy.Brief
34200	       {RESPONSE 0 PATIENT 3}            spoken guy who occasionally
34300						 seems to be angered by
34400						 questions.I'm inferring 
34500					         there's a good bit of
34600						  guarding.
34700						 The quality of some of his
34800						 comments cause me to feel he  
34900						 is moderately paranoid.
35000	
35100	EXCERPT  3:
35200	This excerpt is from a hospitalized paranoid patient.
35300	     I-O Pairs				      Comments By Judge  3
35400	{15} Dr. - I UNDERSTAND HOW YOU
35500	           MIGHT BE AFRAID, CAN YOU
35600	           TELL ME A BIT ABOUT HOW
35700	           YOU ARE DEPRESSED?
35800	     Pt. - THEY'VE SENT ALL THESE         Would be important to know,
35900	           HELL'S ANGELS UP HERE          is this really true or an
36000	           AND THEY HAVE BEEN ON          exaggeration or a delusion?
36100	           MY BACK FOR THE PAST SIX       Seems preoccupied with the
36200	           MONTHS                         threat to his life and
36300						  unwilling to talk about
36400						  anything else but the 
36500						  threat the Hell's
36600						  Angels pose.
36700	      {RESPONSE 4 PATIENT 4}
36800	
36900	{16} Dr. - WHAT DO YOU MEAN THEY
37000	           SENT THE HELL'S ANGELS UP
37100	           HERE?
37200	     Pt. - FROM DIFFERENT SECTORS OF      Vagueness a bit alarming.
37300	           MOTORCYCLE GANGS. A LOT OF
37400	           THEM ARE SERVICE CONNECTED
37500	
37600	       {RESPONSE 6 PATIENT 4}
37700	
37800	{17} Dr. - YOU MEAN YOU FEEL IN 
37900	           DANGER EVEN HERE IN THE
38000	           HOSPITAL?
38100	     Pt. - YES      			  If it's true what has gone
38200						  on before, it must be a
38300						  frightening situation
38400						  for the patient.
38500	      {RESPONSE 6 PATIENT 4}
38600	
38700	{18} Dr. - THAT MUST REALLY BE AWFUL,  
38800	           HERE YOU COME TO THE HOSPITAL
38900	           FOR HELP AND YOU ARE STILL FRIGHTENED
39000	     Pt. - I CAN'T EVEN TRUST MY
39100	           DOCTOR ANYMORE                 Some heightening of feeling
39200						  this man is paranoid.
39300						  Genuinely frightened and 
39400	      {RESPONSE 6 PATIENT 5}              can count on no one for
39500			       			  help.
39600	
39700	{19} Dr. - WHAT WOULD BE OF HELP
39800	           TO YOU NOW?
39900	     Pt. - I'M GETTING SLEEPY NOW         Patient has organized a
40000	           AND HAVE TO GO TO BED.         system of people who will
40100	           THE NUMBER ONE GUY ALWAYS      harm him.The remark about
40200		   HAS FOUR AND FIVE LETTERS	  letters in the last name 
40300	           IN HIS LAST NAME.              gives some credence to 
40400	                                          idea this is delusional
40500						  system.
40600	
40700	        {RESPONSE 7 PATIENT 7}
40800	
40900	EXCERPT  4:
41000	This excerpt is from an interview by a different judge with the same
41100	patient as in excerpt 3.
41200	      I-O Pairs				Comments By Judge  4
41300	{1} Dr. - COULD YOU TELL ME A
41400	          LITTLE BIT ABOUT YOUR 
41500	          FEAR OF DYING AND IS THIS
41600	          SOMETHING YOU HAVE FELT
41700	          IN THE PAST?
41800	    Pt. - I HAVE FELT THIS FOR AL-      He responded appropriately.
41900	          MOST A YEAR                   Response is concrete.He
42000						doesn't tell me much
42100	     {RESPONSE 7 PATIENT 9}             about his fear of dying.
42200						 Somewhat guarded response.
42300	
42400	{2} Dr. - COULD YOU TELL ME A 
42500	          LITTLE MORE ABOUT YOUR
42600	          FEAR OF DYING AND HAVE
42700	          YOU ANY THOUGHTS ABOUT
42800	          HOW IT IS GOING TO HAPPEN
42900	          AND WHETHER ANYONE WANTS
43000	          TO HURT YOU?
43100	    Pt. - I WILL END UP IN THE SEA   	Somewhat strange  response.
43200	          AND THE HELL'S ANGELS WILL    Preoccupation with dying.
43300	          DO IT       			Hell's Angels symbolic,
43400						dangerous people in the
43500						culture. I doubt if he has
43600	      {RESPONSE 9 PATIENT 9}   		any contact with the Hell's
43700						Angels.
43800	
43900	{3} Dr. - COULD YOU TELL ME A 
44000	          LITTLE BIT ABOUT THE KINDS
44100	          OF CONTACT YOU HAVE HAD
44200	          WITH THE HELL'S ANGELS IN
44300	          THE PAST?
44400	    Pt. - I HAVE KNOWN SOME OF THEIR 	Answer hard to evaluate.He
44500	          DEALERS AND PUSHERS           may be telling the truth,
44600						it may be his fantasy. Maybe
44700						guy is in for drug addiction.
44800		{RESPONSE 6 PATIENT 9}		Somewhat concrete, guarded,
44900						and frightened.
45000	
45100	{4} Dr. - COULD YOU SAY A LITTLE
45200	          MORE ABOUT THE CIRCUMSTANCES
45300	          IN WHICH YOU HAVE KNOWN SOME 
45400	          OF THEIR DEALERS AND PUSHERS?
45500	    Pt. - THEY WERE MEMBERS OF MY    	It doesn't really answer the
45600	          COMMUNITY WHEN I GOT OUT      question, a little on a tan-
45700	          OF THE SERVICE THEY HAD       gent unconnected to the
45800	          BEEN MY FRIENDS FOR SO LONG   information I am asking. Does
45900						not tell me very much. Again
46000						guarded response.
46100	      {RESPONSE 6 PATIENT 8}
46200	
46300	{5} Dr. - DID YOU DEAL WITH THEM
46400	          YOURSELF AND HAVE YOU
46500	          BEEN ON DRUGS OR NAR-
46600	          COTICS EITHER NOW OR
46700	          IN THE PAST?
46800	    Pt. - YES I HAVE IN THE PAST     	To differentiate him from
46900	          BEEN ON MARIHUANA REDS        previous patient, at least
47000	          BENNIES LSD       		there is a certain amount
47100						of appropriateness to the
47200						answer although it doesn't
47300						tell me much about what I
47400	       {RESPONSE 3 PATIENT 7}		asked at least it's not
47500						bizarre. If I had him in my
47600						office I would feel con-
47700						fident I could get more
47800						information if I didn't
47900						have to go through the
48000						teletype. He's a little more
48100						willing to talk than the
48200						previous person.Answer
48300						to the question is fairly
48400						appropriate though not 
48500						extensive. Much less of a 
48600						flavor of paranoia than
48700						any of previous responses.
48800	
48900	{6} Dr. - COULD YOU TELL ME HOW      	
49000	          LONG YOU HAVE BEEN IN THE
49100	          HOSPITAL AND SOMETHING
49200	          ABOUT THE CIRCUMSTANCES
49300	          THAT BROUGHT YOU HERE?
49400	    Pt. - CLOSE TO A YEAR AND		Response somewhat appropriate 
49500	          PARANOIA BROUGHT ME 		but doesn't tell me much.
49600	          HERE				The fact that he uses the
49700						word paranoia in the way
49800						that he does without
49900	      {RESPONSE 5 PATIENT 7}		any other information,
50000						indicates maybe its a label 
50100						he picked up on the ward 
50200	                                        or from his doctor.
50300						Lack of any kind of under-
50400						standing about  himself.
50500						Dearth, lack of information.
50600						He's in some remission. Seems
50700						somewhat like a put-on. Seems
50800						he was paranoid and is in 
50900						some remission at this time.
51000	
51100	{7} Dr. - COULD YOU SAY SOMETHING
51200	          NOW ABOUT YOUR PARANOID 
51300	          FEELINGS BOTH AT THE 
51400	          TIME OF ADMISSION AND
51500	          DO YOU HAVE SIMILAR FEELINGS
51600	          NOW AND IF SO HOW DO THEY 
51700	          AFFECT YOU?
51800	    Pt. - AT THE TIME OF ADMISSION	This response moves paranoia 
51900	          I THOUGHT THE MAFIA WAS  	back up. Stretching reality 
52000	          AFTER ME AND NOW ITS THE	somewhat to think Hell's Angels 
52100	          HELL'S ANGELS			are still interested in him.
52200						Somewhat bizarre in terms of 
52300	                                        content. Quite paranoid.
52400	      {RESPONSE 8 PATIENT 9}		Still paranoid. Gross and primitive
52500						responses.In middle of interview I
52600						felt patient was in touch but now
52700						responses have more concrete aspect.
52800	
52900	{8} Dr. - DO YOU HAVE ANY THOUGHT
53000	          AS TO WHY THESE TWO
53100	          GROUPS WERE AFTER YOU?
53200	    Pt. - BECAUSE I STOPPED SOME 	Response seems far fetched 
53300	          OF THEIR DRUG SUPPLY		and hard to believe unless 
53400						he was a narcotic agent which 
53500						I doubt. Sounds somewhat 
53600	      {RESPONSE 9 PATIENT 9}		grandiose, magical, paranoid
53700						flavor, in general indicates 
53800						he's psychotic, paranoid 
53900						schizophrenic with delusions  
54000						about these two groups and 
54100						I wouldn't rule out
54200						some hallucinations as well.
54300						Appropriateness of response 
54400						answers question in concrete 
54500						but unbelievable way.
54600	
54700	6.5 ANALYSIS (1)
54800		Names of potential protocol judges (N=105) were selected from
54900	the 1970 American Psychiatric Association Directory using a table  of
55000	random  numbers. They were initially not informed that a computer was
55100	involved.  (After the experiment,  the  participating  judges  (N=33)
55200	were  fully  informed  as  to its purpose and results.) The 105 names
55300	were divided into eight groups.  Each member  of  a  group  was  sent
55400	transcripts  of three interviews along with a cover letter requesting
55500	his  participation  in  the  experiment.  The  interview  transcripts
55600	consisted of:
55700		1) An interview conducted by one of the eight judges with the
55800		  paranoid model,
55900		2) An interview conducted by the same interview judge with a 
56000		  human paranoid patient, and
56100		3) An interview conducted by a different psychiatrist with a 
56200		  human patient who was not clinically paranoid.
56300	
56400	After each input-output pair in the transcripts there were two  lines
56500	of  rating numbers such that the protocol judges could circle numbers
56600	corresponding to their ratings of both the previous response  of  the
56700	patient,  and  an  overall  evaluation of the patient on the paranoid
56800	continuum. Thirty three protocol judges returned the rated  protocols
56900	properly filled out and all were used in our data.
57000	
57100		The  interviews  with  nonparanoid  patients were included to
57200	control for the  hypothesis  that  any  teletyped  interview  with  a
57300	patient  might  be  judged  "paranoid". However, virtually all of the
57400	ratings of the nonparanoid interviews were zero for  paranoia,  Hence
57500	the hypothesis was falsified.
57600	
57700	
57800		The first index  of  indistnguishability  between  model  and
57900	patient  was  simple:   namely,  the  final  overall rating given the
58000	patient and the model. The question was: "Which was  rated  as  being
58100	more  paranoid,  the  patient, the model, or neither?" (See Table 1).
58200	The protocol judges were  more  likely  to  distinquish  the  overall
58300	paranoid  level  of the model and the patient. In 37.5% of the paired
58400	interviews, the interview judges gave tied scores to  the  model  and
58500	the  patient as contrasted to only 9% of the protocol judges.  Of the
58600	35 non-tied  paired  ratings,  15  rated  the  model  as  being  more
58700	paranoid.  If p is the theoretical probability of a judge judging the
58800	model more paranoid than a human paranoid patient, we  find  the  95%
58900	confidence  interval for p to be 0.27 to 0.59.  Since p=0.5 indicates
59000	indistinguishability of overall ratings of model and patient and  our
59100	observed  p=0.43,  the  results support the claim that the model is a
59200	successful simulation of a paranoid patient.
59300	
59400		Separate  analysis  of  the strong and weak versions of PARRY
59500	indicated that the strong version was  indeed  judged  more  paranoid
59600	than  the  paranoid  patients, the weak version less paranoid. Thus a
59700	change in the parameter structure of the paranoid  model  produced  a
59800	change  along  the  dimension  of  paranoid  behavior in the expected
59900	direction.
60000	
60100	(TABLE 1
60200		Relative final overall ratings of paranoid model vs. paranoid
60300	patient indicating which was given highest overall rating of paranoia
60400	at end of interview.)
60500		(INSERT TABLE 1 HERE)
60600	
60700	
60800	
60900	
61000	
61100	
61200	
61300	
61400	6.6 ANALYSIS (2)
61500		The  second  index of indistinguihability is a more sensitive
61600	measure based on the two series of response  ratings  in  the  paired
61700	interviews.   The   statistic  used  is  basically  the  standardized
61800	Mann-Whitney statistic (Siegel,1956).
61900			(INSERT EQUATION HERE)
62000	
62100	where R is the sum of the ranks of the response ratings in the series
62200	of ratings given to the model, n the number of responses given by the
62300	model, and m the number of  responses  given  by the patient.  If the
62400	ratings given by a judge are randomly allocated to model and patient,
62500	i.e. model and patient are indistinguishable in response ratings, the
62600	expected value of Z is 0, with unit standard  deviation.   If  higher
62700	ratings  are  more  likely to be assigned to the model, Z is positive
62800	and  conversely, negative values of Z indicate greater likelihood  of
62900	assigning  higher  ratings to the patient. Each judge in evaluating a
63000	pair of interviews generates a single value of Z.
63100	
63200		The overall mean of the Z scores was -0.044 with the standard
63300	deviation 1.68 (df=40).  Thus the overall 95% confidence interval for
63400	the asymtotic  mean  value  of  Z is -0.485 to +0.573. The range of Z
63500	values is -3.8 to +4.46. The length of the confidence interval  is  a
63600	result  of  the  large variance which itself is mainly related to the
63700	contrast between the weak and strong versions.  (See TABLES 2 and 3).
63800	Once  again the strong version of the model is more paranoid than the
63900	patients, the weak version less paranoid.
64000	
64100		(INSERT TABLE 2)
64200		(SUMMARY STATISTICS OF Z RATINGS BY GROUP)
64300	
64400	
64500	
64600	
64700	
64800	
64900	
65000	
65100	
65200		It  is  not  surprising that results using the two indices of
65300	indistinguishability are  parallel,  since  the  indices  are  highly
65400	interrelated.  The  mean  Z  value for the 15 interviews on which the
65500	model was rated more paranoid was +1.28, on the  6  where  model  and
65600	patient tied: 0.41, on the 20 in which the patient was more paranoid:
65700	-0.993.   A positive value of Z was observed  when  the  patient  was
65800	given  an  overall  rating greater than the model 6 times; a negative
65900	value of Z when the model was rated more paranoid twice.
66000	
66100	(INSERT TABLE 3)
66200	(Analysis of Variance of Z Ratings)
66300	
66400	
66500	
66600	
66700	
66800	
66900	
67000	
67100	
67200	
67300	
67400	
67500	
67600		It  is  worth emphasizing that these tests invited refutation
67700	of the model.   The experimental design of the tests put the model in
67800	jeopardy  of  falsification.    If the paranoid model did not survive
67900	these tests, i.e.     if it were not considered  paranoid  by  expert
68000	judges  and  if  there  were  no  correlation between the weak-strong
68100	versions of the model and the severity ratings of the judges, then no
68200	claim regarding the success of the simulation could be made. Survival
68300	of potentially falsifying tests constitutes a validating step  for  a
68400	model.
68500	
68600	6.7 ANALYSIS (3) THE MACHINE QUESTION
68700		For quite a long time people have wondered how to distinguish
68800	a  man  from  an  imitation  of  a  man.  The  Greeks made statues so
68900	lifelike, it is said, they had to be chained down to keep  them  from
69000	walking  away.  To distinguish a man from a statue, Galileo suggested
69100	tickling each with a feather.  To distinguish a man  from  a  machine
69200	Descartes  proposed  conversational  tests which the machine, lacking
69300	the  ability  to  make  appropriate  replies,  would  fail.  Turing's
69400	imitation  games  have  been  discussed  on  p.000.  As heirs to this
69500	tradition, we perhaps inevitably  became  curious  how  judges  using
69600	transcripts might answer the machine-question, i.e. which interviewee
69700	is a human and which is the computer model?
69800		To  ask  the machine-question, we sent interview transcripts,
69900	one with a patient and one with PARRY, to 100 psychiatrists  randomly
70000	selected from the Directory of American Specialists and the Directory
70100	of the American Psychiatric Association.  Of the 41 replies, 21 (51%)
70200	made the correct identification while 20 (49%) were wrong.   Based on
70300	this random sample of 41 psychiatrists, the 95%  confidence  interval
70400	is between 35.9 and 66.5. The results indicate chance guessing.
70500		Psychiatrists   are   considered  expert  judges  of  patient
70600	interview behavior but they are unfamiliar with computers.  Hence  we
70700	conducted  the  same  test  with  100  computer  scientists  randomly
70800	selected from the membership list of the  Association  for  Computing
70900	Machinery,  ACM.   Of the 67 replies 32 (48%) were right and 35 (52%)
71000	were wrong. Based on this random sample of 67 computer scientists the
71100	95%  confidence  interval ranges from 36 to 60. Again the results are
71200	close to a chance level.
71300		So both computer scientists and psychiatrists were unable, at
71400	better than a random guessing level, to  distinguish  transcripts  of
71500	interviews  with  the  model from transcripts of interviews with real
71600	patients.
71700		But  what  do  we  learn from asking the machine-question and
71800	finding that the distinction is not made? What we would most like  to
71900	know  is  how  to improve the model.  Simulation models do not spring
72000	forth in a complete, perfect and final form; they must  be  gradually
72100	developed  over  time.   Pehaps  a  correct model-patient distinction
72200	might be made if we allowed  a  large  number  of  expert  judges  to
72300	conduct the interviews themselves rather than studying transcripts of
72400	other interviewers.  This would indeed indicate that the  model  must
72500	be improved. But unless we systematically investigated how the judges
72600	succeeded in making  the  discrimination,  we  would  not  know  what
72700	aspects  of the model to work on.  The logistics of such a design are
72800	immense, and obtaining a large number of judges for sound statistical
72900	inference would require an effort incommensurate with the information
73000	yielded.
73100	
73200	6.8 ANALYSIS (4)  MULTIDIMENSIONAL EVALUATION 
73300		A more efficient and informative way to use Turing-like tests
73400	is  to  ask  judges  to  make  ratings  along  scaled dimensions from
73500	teletyped interviews. This might  be  called  asking  the  "dimension
73600	question".    One can then compare scaled ratings of the patients and
73700	the model in order to determine precisely where and by how much  they
73800	differ.   In constructing our model we strove for one which exhibited
73900	indistinguishability along  some  dimensions  and  distinguishability
74000	along others. That is, we wanted the model to converge on what it was
74100	intended to simulate and to diverge from that which it was not. Since
74200	a  model  represents  a simplification nad a partial approximation, a
74300	perfect fit is not to be expected.
74400		Paired-interview   transcripts   were  sent  to  another  400
74500	randomly selected psychiatrists asking them to rate the responses  of
74600	the two `patients' along multiple dimensions. The judges were divided
74700	into groups, each judge being asked to rate  responses  of  each  I-O
74800	pair  in  the  interviews along four dimensions.  The total number of
74900	dimensions in this test  was  twelve:   linguistic  noncomprehension,
75000	thought  disorder,  organic brain syndrome, bizarreness, anger, fear,
75100	ideas of reference, delusions, mistrust,  depression,  suspiciousness
75200	and  mania.  These are dimensions which psychiatrists commonly use in
75300	evaluating patients. There were three groups of  judges,  each  group
75400	being assigned 4 of the 12 dimensions.
75500	
75600			(INSERT TABLE 4 HERE)
75700	
75800		Table 4 shows there were significant differences, with  PARRY
75900	receiving   higher   scores   along   the  dimensions  of  linguistic
76000	noncomprehension, thought disorder, bizarreness, anger, mistrust  and
76100	suspiciousness.  On  the  delusion  dimension the patients were rated
76200	significantly higher.   There were no significant  differences  along
76300	the  dimensions  of organic brain syndrome, fear, ideas of reference,
76400	depression and mania.
76500		Whereas    tests   asking   the   machine-question   indicate
76600	indistinguishability at  the  gross  level,  a  study  of  the  finer
76700	structure  of  the  model's  behavior  through  ratings  along scaled
76800	dimensions shows significant differences between patients and  model.
76900	These differences are of help to us in suggesting what areas   of the
77000	model should be modified to improve its performance.   The  graph  of
77100	Fig.  2 shows that no modifications are necessary along the dimension
77200	of  "organic  brain  syndrome".   But  it  is  clear   that   PARRY'S
77300	language-comprehension  might  be improved. Then a future dimensional
77400	test would tell whether improvement had occurred  and  by  how  much.
77500	Successive  identification  of particular areas of failure provides a
77600	type of sensitivity analysis  which  makes  clear  what  improvements
77700	should be pursued in developing more adequate model versions.
77800	
77900		(INSERT FIG. 2 HERE)
78000	
78100	6.5 ANALYSIS (5)  A RANDOM MODEL 
78200		Further evidence that  the  machine-question  is  too  low  a
78300	hurdle  for a simulation model and too insensitive a test, comes from
78400	the following experiment.  In  this  test  we  constructed  a  random
78500	version  of  the paranoid model (RANDOM-PARRY) which utilized PARRY'S
78600	output statements, but expressed them randomly  independent  of  what
78700	the  interviewer  said.   Two psychiatrists conducted interviews with
78800	this model, transcripts of which were paired with patient  interviews
78900	and  sent  to  200  randomly  selected  psychiatrists asking both the
79000	machine-question and the dimension-question.   Of the 69  replies  to
79100	the  machine  question, 34 (49%) were right and 35 (51%) wrong. Based
79200	on this  random  sample  of  69  psychiatrists,  the  95%  confidence
79300	interval ranges from 39 to 63, again indicating chance guessing. When
79400	a poor model, such as a  random  one,  passes  a  test,  it  strongly
79500	suggests the test is weak.
79600	
79700		(INSERT TABLE 5 HERE)
79800	
79900		Although a distinction is not made when  the simple  machine-
80000	question is asked, definite distinctions ARE made when judgements are
80100	requested  along  specific  dimensions.    As  shown  in   Table   5,
80200	significant  differences  appear  along  the dimensions of linguistic
80300	noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
80400	rated  higher.   On  these  particular  dimensions we can construct a
80500	continuum in which the random version  represents  one  extreme,  the
80600	actual patients another. Nonrandom PARRY lies somewhere between these
80700	two extremes, indicating that it performs significantly  better  than
80800	the random version but still requires improvement before it can be
80900	considered   indistinguishable   from   patients  relative  to  these
81000	dimensions. Table 6 presents t values for  differences  between  mean
81100	ratings  of  PARRY  and  RANDOM-PARRY. (See Table 6 and Fig.2 for the
81200	mean ratings).
81300	
81400		(INSERT TABLE 6 AND FIG 2 HERE)
81500	
81600		These studies show that a more useful way to use  Turing-like
81700	indistinguishability  tests  is  to ask expert judges to make ratings
81800	along multiple dimensions deemed essential to the model.    Thus  the
81900	model  can  serve  as  an  instrument for its own perfection.  A good
82000	validation procedure has criteria for better or worse approximations.
82100	Useful  tests do not necessarily prove a model; they probe it for its
82200	strengths and weaknesses and clarify what is to be done next  in  the
82300	way  of  modification  and repair. Simply asking the machine-question
82400	yields little information relevant to what  the  model  builder  most
82500	wants  to know, namely, along which dimensions does the model need to
82600	be modified in order to effect an improvement in its performance?
82700	
82800		To  conclude,  it  is  perhaps  historically significant that
82900	these tests were conducted at all. To my knowledge, no  one  to  date
83000	has  subjected  an  interactive  simulation  model  of human symbolic
83100	processes to multidimensional indistinguishability tests. These tests
83200	set a precedent and provide a standard against which competing models
83300	might be measured.